Performance Analysis of Runtime Data Declustering over SAN-Connected PC Cluster

نویسندگان

  • Masato Oguchi
  • Masaru Kitsuregawa
چکیده

Personal computer/workstation (PC/WS) clusters have come to be studied intensively in the field of parallel and distributed computing. They are considered to play an important role as a large scale computer system in the next generation, such as large server sites and/or high performance parallel computers, because of their good scalability and cost performance ratio. In the viewpoint of applications, data intensive applications including data mining and ad-hoc query processing in databases are considered very important for high performance computing, in addition to the conventional scientific calculation. Thus, investigating the feasibility of such applications on a PC cluster is meaningful. In this paper, a PC cluster connected with Storage Area Network(SAN) is built and evaluated. In the case of SAN cluster, each node can access all shared disks directly without using LAN; thus, SAN clusters achieve much better performance than LAN clusters for disk access operations. However, if a lot of nodes access the same shared disk simultaneously, application performance degrades due to I/O-bottleneck. A runtime data declustering method, in which data is declustered to several other disks dynamically during the execution of application, is proposed to resolve this problem. Parallel data mining is implemented and evaluated on the SAN-connected PC cluster. This application requires iterative scans of a shared disk, which degrade execution performance severely due to I/O-bottleneck. The runtime data declustering method is applied and characteristics of the system such as I/O and network operations are evaluated in detail. According to the results of experiments, the proposed method prevents performance degradation caused by shared disk bottleneck in SAN clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Runtime Data Declustering based on Bandwidth-on-Demand and its Evaluation over SAN-connected PC cluster

Clusters of computers are used in large scale server sites recently, because of their good scalability and cost/performance ratio. In addition, Storage Area Network (SAN) is introduced in order to consolidate back end of such systems. I/O-bottleneck is serious problem in such an environment, because some important data-intensive applications often access part of data concurrently and repeatedly...

متن کامل

Runtime Data Declustering over SAN-Connected PC Cluster System

Recently, personal computer/workstation (PC/WS) clusters have come to be studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications including data mining and ad-hoc query processing in databases are considered very important for massively parallel processors, in addition to the conventional scientific calculation. Thus, ...

متن کامل

Data mining on PC cluster connected with storage area network: its preliminary experimental results

Personal computer/Workstation (PC/WS) clusters have become a hot research topic recently in the field of parallel and distributed computing. They are considered to play an important role as a large scale computer system, such as large server sites and/or high performance parallel computers, because of their good scalability and cost performance ratio. In the viewpoint of applications, data inte...

متن کامل

Run-Time Load Balancing System on SAN-connected PC Cluster for Dynamic Injection of CPU and Disk Resource - A Case Study of Data Mining Application

PC cluster system is an attractive platform for data-intensive applications. But the conventional shared-nothing system has a limit on load balancing performance and it is difficult to change the number of nodes and disks dynamically during execution. In this paper, we develop dynamic resource injection, where the system can inject CPU power and expand I/O bandwidth by adding nodes and disks dy...

متن کامل

Implementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments

Personal Computer/Workstation clusters have been studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered very important for high performance computing, as well as conventional scientific calculations. We have built and evaluated PC cluster pil...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002